Skip to content

Conversation

@gilbertococchi
Copy link
Contributor

@gilbertococchi gilbertococchi commented Sep 26, 2023

OneTrust CMP Detection is buggy, currently it matches www.google.com as being used OneTrust where it's not the case.

https://www.webpagetest.org/jsonResult.php?test=230926_AiDcK7_97Z&highlight=1

I would suggest to remove this match for OneTrust to solve this issue.

@rviscomi @pmeenan FYI

DSN TXT Match seems to be buggy in Wappalyzer and matches domains such as www.google.com flaggying it to be using OneTrust or other technologies not being used like "Apple iCloud Mail"
@tunetheweb
Copy link
Member

As discussed offline, someone has verified OneTrust against google.com as can see it in DNS:

11:25:53 ~ $ dig -t txt google.com 
;; Truncated, retrying in TCP mode.

; <<>> DiG 9.10.6 <<>> -t txt google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58666
;; flags: qr rd ra; QUERY: 1, ANSWER: 12, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;google.com.			IN	TXT

;; ANSWER SECTION:
google.com.		3600	IN	TXT	"docusign=05958488-4752-4ef2-95eb-aa7ba8a3bd0e"
google.com.		3600	IN	TXT	"globalsign-smime-dv=CDYX+XFHUw2wml6/Gb8+59BsH31KzUr6c1l2BPvqKX8="
google.com.		3600	IN	TXT	"atlassian-domain-verification=5YjTmWmjI92ewqkx2oXmBaD60Td9zWon9r6eakvHX6B77zzkFQto8PQ9QsKnbf4I"
google.com.		3600	IN	TXT	"google-site-verification=TV9-DBe4R80X4v0M4U_bd_J9cpOJM0nikft0jAgjmsQ"
google.com.		3600	IN	TXT	"apple-domain-verification=30afIBcvSuDV2PLX"
google.com.		3600	IN	TXT	"webexdomainverification.8YX6G=6e6922db-e3e6-4a36-904e-a805c28087fa"
google.com.		3600	IN	TXT	"v=spf1 include:_spf.google.com ~all"
google.com.		3600	IN	TXT	"MS=E4A68B9AB2BB9670BCE15412F62916164C0B20BB"
google.com.		3600	IN	TXT	"docusign=1b0a6754-49b1-4db5-8540-d2c12664b289"
google.com.		3600	IN	TXT	"onetrust-domain-verification=de01ed21f2fa4d8781cbc3ffb89cf4ef"
google.com.		3600	IN	TXT	"facebook-domain-verification=22rm551cu4k0ab0bxsw536tlds4h95"
google.com.		3600	IN	TXT	"google-site-verification=wD8N7i1JTNTkezJ49swvWW48f8_9xveREV4oB-0Hf5o"

;; Query time: 1034 msec
;; SERVER: 192.168.107.1#53(192.168.107.1)
;; WHEN: Tue Sep 26 11:26:00 IST 2023
;; MSG SIZE  rcvd: 885

As it's not used on ALL google.com properties, I agree it's not a great check to reply on that DNS entry, and the other ones are probably better checks so agree with removing this.

This is however our first divergence from Wappalyzer since they closed-sourced this. Are we happy to diverge? There was some talk of others maintaining an open-source version but not sure if that will happen (or if they realise how much work this will be!)

Copy link
Member

@rviscomi rviscomi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok to diverge for the time being. If and when we settle on a better solution we can merge any changes upstream.

@gilbertococchi
Copy link
Contributor Author

@rviscomi friendly ping, can we remove this dns match on OneTrust to exclude these outliers?

Thanks
Gil

@tunetheweb tunetheweb merged commit fe8c861 into HTTPArchive:main Nov 10, 2023
@gilbertococchi gilbertococchi deleted the patch-2 branch November 10, 2023 09:37
max-ostapenko added a commit that referenced this pull request Apr 5, 2025
max-ostapenko added a commit that referenced this pull request Apr 25, 2025
* lint

* cleanup

* npm

* test error

* test error #2

* remove errors

* reset schema changes

* standard and puppeteer

* editorconfig

* updated eslint config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants